{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-colab"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/pathwaycom/pathway/blob/main/examples/notebooks/tutorials/suspicious_user_activity.ipynb\" target=\"_parent\"><img src=\"https://pathway.com/assets/colab-badge.svg\" alt=\"Run In Colab\" class=\"inline\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "notebook-instructions",
      "source": [
        "# Installing Pathway with Python 3.10+\n",
        "\n",
        "In the cell below, we install Pathway into a Python 3.10+ Linux runtime.\n",
        "\n",
        "> **If you are running in Google Colab, please run the colab notebook (Ctrl+F9)**, disregarding the 'not authored by Google' warning.\n",
        "> \n",
        "> **The installation and loading time is less than 1 minute**.\n"
      ],
      "metadata": {}
    },
    {
      "cell_type": "code",
      "id": "pip-installation-pathway",
      "source": [
        "%%capture --no-display\n",
        "!pip install --prefer-binary pathway"
      ],
      "execution_count": null,
      "outputs": [],
      "metadata": {}
    },
    {
      "cell_type": "code",
      "id": "logging",
      "source": [
        "import logging\n",
        "\n",
        "logging.basicConfig(level=logging.CRITICAL)"
      ],
      "execution_count": null,
      "outputs": [],
      "metadata": {}
    },
    {
      "cell_type": "markdown",
      "id": "2",
      "metadata": {
        "lines_to_next_cell": 0
      },
      "source": [
        "# Real-Time Anomaly Detection: identifying brute-force logins using Tumbling Windows\n",
        "\n",
        "In this tutorial you will learn how to perform a [tumbling window](/glossary/tumbling-window) operation to detect suspicious activity.\n",
        "\n",
        "Your task is to detect suspicious user login attempts during some period of time.\n",
        "You have a record of login data. Your goal is to detect suspicious users who have logged in more than 5 times in a single minute.\n",
        "\n",
        "To do this, you will be using the `windowby` syntax with a `pw.temporal.tumbling()` object. Let's jump in!\n",
        "\n",
        "Your input data table has the following columns:\n",
        "* `username`,\n",
        "* whether the login was `successful`,\n",
        "* `time` of a login attempt,\n",
        "* `ip_address` of a login.\n",
        "\n",
        "\n",
        "Let's start by ingesting the data:\n",
        "\n",
        "\n",
        "First ingest the data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "3",
      "metadata": {},
      "outputs": [],
      "source": [
        "# Uncomment to download the required files.\n",
        "# %%capture --no-display\n",
        "# !wget https://public-pathway-releases.s3.eu-central-1.amazonaws.com/data/suspicious_users_tutorial_logins.csv -O logins.csv"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "4",
      "metadata": {},
      "outputs": [],
      "source": [
        "import pathway as pw\n",
        "\n",
        "# To use advanced features with Pathway Scale, get your free license key from\n",
        "# https://pathway.com/features and paste it below.\n",
        "# To use Pathway Community, comment out the line below.\n",
        "pw.set_license_key(\"demo-license-key-with-telemetry\")\n",
        "\n",
        "\n",
        "\n",
        "class InputSchema(pw.Schema):\n",
        "    username: str\n",
        "    successful: str\n",
        "    time: int\n",
        "    ip_address: str\n",
        "\n",
        "\n",
        "logins = pw.io.csv.read(\n",
        "    \"logins.csv\",\n",
        "    schema=InputSchema,\n",
        "    mode=\"static\",\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "5",
      "metadata": {},
      "source": [
        "The CSV data has the string values \"True\" and \"False\" in the `successful` column.\n",
        "\n",
        "Let's convert this to a Boolean column:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "6",
      "metadata": {},
      "outputs": [],
      "source": [
        "logins = logins.with_columns(successful=(pw.this.successful == \"True\"))"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "7",
      "metadata": {},
      "source": [
        "Then, let's filter attempts and keep only the unsuccessful ones."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "8",
      "metadata": {},
      "outputs": [],
      "source": [
        "failed = logins.filter(~pw.this.successful)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9",
      "metadata": {},
      "source": [
        "Now, perform a tumbling window operation with a duration of 60 (i.e. 1 minute).\n",
        "\n",
        "Use the `instance` keyword to separate rows by the `ip_address` value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "10",
      "metadata": {},
      "outputs": [],
      "source": [
        "result = failed.windowby(\n",
        "    failed.time, window=pw.temporal.tumbling(duration=60), instance=pw.this.ip_address\n",
        ").reduce(\n",
        "    ip_address=pw.this._pw_instance,\n",
        "    count=pw.reducers.count(),\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "11",
      "metadata": {
        "lines_to_next_cell": 0
      },
      "source": [
        "...and finally, let's keep only the IP addresses where the number of failed logins exceeded the threshold (5):"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "12",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | ip_address    | count\n",
            "^3HJ3BS8... | 50.37.169.241 | 7\n"
          ]
        }
      ],
      "source": [
        "suspicious_logins = result.filter(pw.this.count >= 5)\n",
        "pw.debug.compute_and_print(suspicious_logins)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "13",
      "metadata": {},
      "source": [
        "And that's it! You have used a tumbling window operation to identify suspicious user activity and can now act on this information to increase the security of your platform.\n",
        "\n",
        "Reach out to us on [Discord](https://discord.gg/pathway) if you'd like to discuss [real time anomaly detection](/glossary/real-time-anomaly-detection) use cases like this one in more detail!"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.14"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}